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Abstract 

The forces of contact during manipulation convey substantial information about the state of the manipula- 
tion. Textures, slip, impacts, grasping, and other contact conditions produce force and position signatures 
that can be used for identifying the state of contact. This paper address the fundamental problems of 
interpreting the force signals without any additional context on the state of manipulation. Techniques 
based on forms of the generalized sequential likelihood ratio test are used to segment individual strain 
signals into statistically equivalent pieces. The results of the segmentation are designed to be used in a 
higher level procedure which will interpret the results within a manipulation context. We report on our 
experimental development of the segmentation algorithm and on its results for detecting and labelling 
impacts, slip, changes in texture, and condition. The sequential likelihood ratio test is reviewed and some 
of its special cases and optimal properties are discussed. Finally, we conclude by discussing extensions to 
the techniques and lessons for sensor design. 
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1 Introduction 

A tremendous amount of information is available in the 
contact forces of manipulation. Figure 1 shows a spec- 
trogram of an impact event. The impact results in an 
increase in energy at all frequencies locally around the 
event, and a persistent residual vibration at the sensor's 
natural frequency. This signal, like all contact force sig- 
nals, can be broken into regions that are similar. We 
term the different regions contact states and the transi- 
tions between regions contact events. 

During a manipulation task like grasping, pushing, or 
typing, numerous such events occur. One approach to 
robot programming, for any of these tasks, is to write a 
program component that detects and labels each event 
and a component which generates an appropriate action 
based on the sensed event. The two components must 
mesh because each provides a decision or knowledge con- 
text for the other. The current action selects possible 
interpretations of sensor signals, and the changes in sen- 
sor signals guide the choice of new actions. In general, 
most robot programming has used an action centered 
paradigm. That is the programmer first determines the 
sequence of actions that should be performed and then 
determines how to use the available sensors to guide the 
actions. The guarded move is the typical sensing/action 
strategy that results. Brock [8] provides a recent study 
and review of this approach. 

This paper is based on a sensing centered program- 
ming paradigm. In this approach, the properties char- 
acteristic of the sensor signals alone are determined and 
then an event detector for changes in those properties 
is designed. Ideally these properties are not biased by 
any particular task model and therefore are useful for all 
tasks. Because the approach is sensor based, the types 
of allowed contexts are controlled by the possible inter- 
pretations given the sensor measurements. The actions 
are then designed based on these possible contexts. This 
paper discusses our work on identifying characteristic 
properties of contact force signals. 

In manipulation, force and position events guide the 
task. This paper focuses on force signals since little work 
has been done on interpreting these signals during com- 
mon tasks. Earlier investigations of this problem, by 
other researchers, focused primarily on designing and 
proving sensor technologies tailored to different contact 
events. In contrast, this paper presents algorithms, de- 
rived from the theory of sequential hypothesis testing, 
that are designed for detecting contact events and label- 
ing contact states. 

Labeling is much more difficult then detecting changes 
in signal characteristics. In general, labeling requires 
knowledge about the possible sources of the signals and 
the source characteristics. This research looked at label- 
ing some simple events without context. 

To our knowledge, this is the first investigation of the 
discrimination of contact events and the first application 
of sequential decision theory to this problem. Although 
we applied the techniques to signals from an intrinsic 
contact sensor [5], the ideas can be applied to any form 
of tactile sensor to enhance performance. Based on the 
experimental results in this paper, we are investigating a 



sensor that is a combination of the intrinsic contact sen- 
sor and a piezoelectric film. This sensor, which is similar 
in design to [12], has better dynamic characteristics and 
should give better performance. 

In the following sections, we first summarize the work 
that has been done on detecting grasping events and 
on human models of temporal tactile sensing. We then 
briefly discuss our experiments and the signal models. 
Then sequential hypothesis testing is introduced and 
used to derive the necessary statistical tests. Next our 
algorithm is presented and related to the general theory. 
We then present the experimental results of our algo- 
rithm, discuss its theoretical characteristics and compare 
it to other techniques. We conclude by discussing the re- 
search issues in contact event perception, and our plans 
for future development. 

2 Previous Work 

During the last decade, considerable research has been 
performed on tactile sensing. Howe [13] provides the 
most recent comprehensive review of current and past 
research. Most of this research has focused on design- 
ing surface array sensors and using these sensors for ob- 
taining geometric information from static measurements. 
Some research has looked at the information that can 
be acquired by actively moving the contact sensor and 
monitoring both the sensor and joint locations. This is 
termed haptic sensing. Primarily prior haptic research 
has focused on actively tracing the contours of objects 
to determine geometry and critical features [7, 27]. This 
work assumes that each measurement is taken with the 
force sensor in a quasi-static state so that normal forces 
and contact locations can be computed. All of this work 
essentially treats the tactile array sensor as a primitive 
form of vision. 

In contrast, a more recent type of contact sensor pro- 
cessing is the detection of information characteristic of 
the dynamic aspects of motion [13]. Mechanical proper- 
ties of objects like mass, friction, and damping can only 
be determined by actively probing and manipulating the 
object. Similarly, the initial contact with an object, and 
slip between the sensors and environment require detect- 
ing relative motion. All of these sensing modalities are 
unique to contact force sensing. 

A few studies have been done on this type of sens- 
ing. By monitoring the acoustic emission from a metal 
gripper, Dornfeld [9] was able to detect the onset of slip 
for some metallic workpieces. Howe and Cutkowsky [12] 
constructed an instrumented latex covered finger. Piezo- 
electric sensors are embedded in the latex cover, and a 
miniature accelerometer is mounted on the inside surface 
of the cover. The piezoelectric sensors are very sensitive 
to strain rate. Because of the small mass of the cover, 
the accelerometers are sensitive to very small forces nor- 
mal to the surface of the sensor. They found that the 
piezoelectric sensor was very sensitive to the changes in 
tangential strain associated with slip, and that the ac- 
celerometer was fairly sensitive to the vibrations normal 
to the sensor associated with slip. 

Bicchi [6] used a six-axis fingertip force-torque sensor 
to estimate the onset of slip (figure 2). This sensor has a 
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Figure 1: Spectrogram of an impact event. The figure shows a contour plot of the energy in frequencies from 200-1350 
Hz as a function of time. The signal was sampled at 2700 Hz. Sixty-four points windowed with a Hamming window 
were used for each fast fourier transform (fft). The fft was computed for every new data point. Note the broad 
frequency band that occurs at an impact and the short time scale of this event. 
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Figure 2: 6-axis fingertip force-torque sensor 



Maltese-cross connecting the outer shell to the base. The 
cross is instrumented with 8 strain-gauge half-bridges. 
The shell has a lightly damped natural frequency of ap- 
proximately 690 Hz when the base is fixed and the shell 
free. In his experiments, Bicchi first determined a coef- 
ficient of friction for the object to be grasped. Then, by 
monitoring the ratio of the tangential force to the normal 
force, he was able to determine when the contact state 
was approaching the slip condition determined earlier. 

All of these methods generally make decisions about 
the contact state based on the instantaneous values of 
the measured signals. In some cases, a lowpass filter 
may be introduced to reduce the "noise" in the signal. 
One exception is McCarragher [20]. McCarragher ex- 
amined the planar assembly process and constructed a 
discrete event dynamic system controller to make deci- 
sions about the current configuration of the parts based 
on the history of force measurements. He used quali- 



tative reasoning on the assembly dynamics to construct 
interpretations of the force signal, and a Petri net to 
provide decision context. 

In contrast, the contribution of this paper is to show 
how the entire history of the signal can be used to make 
decisions in a statistically robust way using the tech- 
niques of sequential decision theory. This technique is 
applicable to all of the sensors that have been investi- 
gated for dynamic contact sensing. In our experiments 
we have applied it to the 6-axis fingertip force-torque 
sensor. 

In this paper, only the individual strain gauge sig- 
nals produced by the sensor (figure 2) are sensed. Each 
signal is treated as independent in order to determine 
what information can be extracted about the contact 
process purely from the characteristics of the scalar sig- 
nal. Without additional measurements or prior knowl- 
edge, all event decisions must be based on characteristics 
of the individual strain time series. We test the signal for 
whiteness, changes in mean, changes in vibration level, 
short discontinuities caused by impulses, and spectral 
structure. The tests are formulated as hypothesis test- 
ing problems based on experimental models of the signal 
characteristics. Section 4 presents our experiments and 
the signal models. Section 5 discusses the theory of se- 
quential hypothesis testing and develops the form of the 
test used in our algorithm. Section 6 shows how the 
algorithm segments some contact signals and discusses 
performance. 



3 Human Capabilities 

Some insights on the design of robot contact sensing al- 
gorithms may be gained by a study of human temporal 
contact sensing. A robot sensing through an intrinsic 
contact sensor or a single piezoelectric sensor is anal- 
ogous to human exploration using a stick. The stick 
encodes the distributed contact information into a tem- 
poral force signal which is transmitted along the length 
of the stick. A surprising amount of information can 
be gained purely through this channel. Our goal in de- 
signing intrinsic contact sensors and algorithms is to un- 
derstand and at least match human performance in this 
mode of exploration. 

Research on human tactile perception has shown that 
there are four types of mechanoreceptors [14, 16, 15]. A 
fair amount is known about the frequency response of 
these receptors to tightly defined inputs. The Merkel 
disks and the Meissner corpuscles are type I popula- 
tions and are near the surface and closely spaced. These 
surface sensors are primarily sensitive to low frequency 
information. The Merkel disks respond to static and 
slow changes with a lowest threshold at about 10 Hz. 
The Meissner corpuscles respond only to changing sig- 
nals with a lowest threshold at about 30 Hz. Most robot 
tactile arrays attempt to duplicate the spacing and ca- 
pabilities of these two sensors. 

The deep skin sensors, type II, are more widely spaced 
with a density of 20/cm 2 and a 10 mm receptive field. 
The Pacianian corpuscles have a sensitive region from 50 
to 400 Hz. The Ruffini organs are directionally sensitive 
and sense skin stretch. 

Johnson [17] provides a recent comprehensive review 
of the tactile sensing system and provides a working hy- 
pothesis for the role played by three of the four basic 
systems: the SAI, RA, and PC systems. The SAI sys- 
tem is the slowly adapting type I population, ending 
in the Merkel disks, and all the pathways conveying its 
signals to memory and perception. Similarly, the RA 
system is the rapidly adapting type I population end- 
ing in the Meissner corpuscles, and the PC system is 
the rapidly adapting type II population ending in the 
Pacianian corpuscles. 

The SAI system is thought to be responsible for en- 
coding low frequency, widely separated, spatial contact 
signals. Experiments with Braille and Roman letters 
show that the SAI system provides an isomorphic im- 
age of the contact signal. Therefore the SAI system is 
thought to be responsible for directly imaging the con- 
tact shape. 

The encoding of texture is still being determined. It 
is clear that the perception of texture requires relative 
movement between the skin and the surface. For widely 
spaced textures, the SAI units are able to determine the 
spatial distribution. However for very fine textures and 
slip, vibration sensing seems to be critical. Both the RA 
and the PC systems appear to be involved. 

Recent studies have shown that slip is detected, for 
textured surfaces, by these rapidly adapting fibers when 
a vibration is set-up by the relative motion of a tex- 
tured surface [26]. The spatial distribution of the con- 
tact is of only secondary importance. Relative motion 



of very small raised dots (4 //m high, 550 //m diameter 
single dots, and 1 //m dots spaced at 100 //m center-to- 
center) on a smooth plate produced reliable activation 
of the fibers. The direction of slip is determined by the 
slowly adapting fibers which measure skin stretch. This 
suggests that the information about the onset of slip is 
carried by the high frequency component of the contact 
force, and that the direction of slip is carried by the di- 
rection of skin stretch. 

In addition, textured surfaces can be differentiated by 
scanning a rigid probe across a surface. The probe en- 
codes only temporal information. The induced vibration 
level seems to be used for discrimination. The PC system 
is the most likely candidate for the transduction system, 
since it is the only system excited by the impulses that 
occur during the initial placement of an object. Neuro- 
physiological study of this mechanism for perception has 
just begun. 

In summary, the SAI, RA, and PC seem to have a 
separation of function loosely based on frequency. This 
separation is used in perceiving the contact state while 
doing manipulation with a tool. The SAI system is tuned 
to determine the contact distribution between the hand 
and the tool. The PC system is able to detect the high 
frequency events that are transmitted down the tool to 
the hand and is insensitive to the shape of the tool. And 
the RA system has a lower spatial sensitivity then the 
SAI system but is more sensitive to vibration and so 
can determine local movements between the tool and 
the hand. 

This work suggests that it is possible to extract infor- 
mation about texture and impacts with an intrinsic con- 
tact sensor. Like the PC system, the algorithms should 
look for high frequency events. In addition, the low fre- 
quency response from the contact sensor should be re- 
lated to the neural encodings for joint torques. Finally, 
results developed for the temporal response from a sin- 
gle contact sensor, may be extendable to analyzing the 
temporal response from a sensor array. 

4 Signal Models 

In order to detect and label different events, models of 
the different signals had to be developed. We built two 
pieces of experimental apparatus in order to character- 
ize the response of the system to impacts, sliding across 
a surface, no contact, and grasping contacts. Statisti- 
cal process models were developed based on the experi- 
ments. These were captured as a set of signal source hy- 
potheses. The next subsection described the experiments 
and the signal models, and the last subsection discusses 
the collection of the models into source hypotheses. 

4.1 Signal Examples and Models 

We considered four basic contact signals: impacts, slip, 
no contact, and grasping contacts. For each basic con- 
tact, an experiment was developed to isolate that par- 
ticular event. Based on these experiments, a statistical 
signal model was developed by testing the description 
against the data. The assumption that these signals 
were basic and could be used to span the contact set 
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Figure 3: Impact Test Apparatus 



was tested by considering some examples of more com- 
plex manipulation. This is discussed in the section 6. 

4.1.1 Impact Model 

To gather impact data, a solenoid released impact 
hammer was placed on a knife pivot (figure 3). The eight 
strain gauge signals resulting from the hammer striking 
the fingertip sensor was recorded at 2.7 kHz. The im- 
pact load could be varied by changing the initial height 
of the force sensor with the micrometer. A representative 
impact signal from one strain gauge and its correspond- 
ing Fast Fourier Transform (FFT) is shown in figure 4. 
There are a number of important characteristics in this 
signal. First, three strikes by the hammer are shown by 
the three sharp rises in the figure. Each of these strikes 
is separated by approximately 150 ms. Second, each of 
the sharp impact signals decays very quickly (approxi- 
mately 1.5 ms). Finally, the sensor continues to ring for 
an extensive period, over 0.5 sec, after the impacts (not 
shown). Finally, the rapid vibration that follows each 
impact events is at approximately 1000 Hz. 

As an initial model of the impact process consider the 
ideal model shown in figure 5. In this model, each impact 
of M2 with M\ is an inelastic collision with coefficient of 
restitution e. The time and distance of the collision are 
small compared to the travel time and distance of M.2- 
Under these assumptions, the process can be modeled 
as a series of impulses applied to M\ . Let X{ be the 
normalized height of M^ after collision i with xo = 1; 
u)q = K\jM\ be the natural frequency of the sensor; 
rj = u)oy/2h/ g be the natural unit of time, with h being 
the true initial height of M^, i% be the time of the ith 
collision with t\ = erj; z be the normalized displacement 
of x\, /? = M2/M1; and C, the damping coefficient which 
is less than 1. The equations of motion for the model 
are then: 
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Using linearity, and the convolution properties of 6, the 
trajectory for Mi is 
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Figure 4: Representative impact signal from one strain 
gauge bridge. There are 128 samples taken at 2.7 kHz. 
The FFT was computed using a square window. The 
signal mean has been removed from the FFT. 



M2 



Interface 



w////////^ 



Ml 



Kl< 



Bl 



Ground 



Figure 5: Ideal model of the impact process. An inelastic 
collision occurs at the boundary over time and distance 
that are small compared to the travel time and distance 
for the impacting mass. 
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Figure 6: Simulation of model response using first 20 
impacts with: e = 0.85, C, = 0.1, (3 = 1, r\ = 5.5 



Simulations of this model result in some of the phe- 
nomena displayed in the actual data. Figure 6 shows 
the sudden increase in displacement, the slow exponen- 
tial decay in the signal, and the discontinuous effect of 
secondary impacts. However, the simulation does not 
display the disorder seen in the actual signal, and is a 
poor match to the data. A more detailed model for this 
process treats the process as an initial condition response 
for a autoregressive (AR) model of unknown order. We 
were able to get good fits to each impact event with 
a fourth order auto-regressive model. An impact event 
lasts for about 50 samples. The first four values of the 
impact signal were used to create the initial conditions 
and the parameters of the autoregressive model were es- 
timated using maximum likelihood (MLE) over the re- 
maining samples. 

The majority of the impact signal energy is in the 
first four samples. The remaining energy is in the ex- 
tended ringing that follows the impact event. The en- 
ergy in the initial part is captured in the model by using 
the first four values as initial conditions. The ringing 
is captured by the autoregressive coefficients. Unfortu- 
nately, the MLE estimator is nonlinear and uses all of 
the data. Therefore the estimates have to be generated 
through numerical search which make the model imprac- 
tical for real-time implementation. Instead we adopted 
the simpler approach of computing a Karhunen-Loeve 
(KL) expansion for the impact shape based on empiri- 
cally segmented training data. 

Seventy-two training example were generated using 
the impact apparatus. The start of the impact was 
found by looking for the first point in the signal that was 
twenty or more standard deviations away from the cal- 
ibrated initial conditions. Forty points were then taken 
from this starting point to get seventy-two examples of 
length forty. The mean was removed from each sam- 
ple individually, and then each sample was normalized 
to have unit energy. Normalization prevented any one 
signal from dominating the result and since the test is 
a ratio between different hypotheses the energy is not 
important. The Karhunen-Loeve expansion, which is an 
eigenvalue expansion of the sample covariance, was then 
computed. The majority of the energy was contained 
in the first four eigenvalues; therefore, these were made 
additional features for the impact model. The features 
were extended beyond forty samples by adding zeros to 



Figure 7: Slip Test Apparatus 



the end of the feature signal. The residual after fitting 
the mean and four features was modeled as white normal 
noise. The resulting impact model for the measurement 
y(t) is 



e(t) 



y^^k-Sk(t) + /« + e(t) 



k=i 



N(0,V). 
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In this model, the measurement, y, is a mean /j, plus a 
linear combination of four features Sk , which are treated 
as being orthogonal to //, and a normal process noise 
e(t). The shape of the impact signal and its residual 
vibration is captured by Sk, the new mean is captured 
by [i. In building this model, the implicit assumption is 
that the shape of the impacts is approximately the same. 
This turned out to be true only in the ideal case. 

4.1.2 Slip Model 

Slip data was collected with the experimental appa- 
ratus shown in figure 7. The sensor was pulled along 
the base with constant normal load. The load could be 
changed by adding weights to the sensor mount. Pieces 
of sandpaper were used to control the surface roughness. 
These were placed part way along the base of the ap- 
paratus. A section of the signal generated from passing 
over the sandpaper is shown in figure 8. The FFT shows 
that the spectrum is essentially flat out to the bandwidth 
of the sensor at which point it rapidly decays. An ap- 
proximate model is then given by a normal white noise 
process 

y(t)~N(»,V). 

This model was also confirmed by a histogram and the 
autocorrelation function. 

4.1.3 No Contact Model 

A sample of the strain gauge signal without any con- 
tact is shown in figure 9. Although the sensor is not 
being contacted, background vibrations are picked-up 
through the base. In this mode, the sensor is acting as an 
accelerometer. The FFT of the signal is again fairly flat, 
and the autocorrelation also indicates that the signal is 
approximately white. For small vibrations, quantization 
noise would be the dominant error which has a uniform 
distribution. However, the vibration levels are such that 
a normal model is a better fit to the data as indicated 
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Figure 8: Representative section of signal generated by 
sliding over a piece of sandpaper. 512 samples taken 
at 2.7 kHz and the corresponding FFT using a square 
window. The signal is fairly flat out to the bandwidth 
of the sensor. 
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Figure 9: Representative rest signal from one strain 
gauge bridge sampled at 2.7 kHz and the correspond- 
ing FFT using a square window. 



by the histogram. Therefore an appropriate model of the 
signal is 



y(t) ~ N(» ,V ). 



This model is distinguished by the particular values of 
Ho an d Vo- To recognize this model these values must be 
calibrated. 



4.1.4 Grasping Contact 

Finally, the fingertip sensor was grasped by hand and 
forces were applied (figure 10). A slip free grasp was 
maintained by relying on human capabilities for detect- 
ing slip. The FFT of the signal shows a lowpass response, 
and an appropriate model is an autoregressive process 



e(t) ~ #(0,17). 
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4.2 Hypothesis Models 

The goal of this work is to segment any strain signal 
into pieces which can be identified with one of the experi- 
mentally selected models. Segmentation can be achieved 
by designing a decision algorithm which is tuned to our 
model set. Therefore we collected the parametric models 



Figure 10: Representative grasp signal. 512 samples at 
2.7 kHz and the corresponding FFT using a square win- 
dow. The fingertip sensor was clamped in place and then 
loads were applied by grasping and twisting the sensor 
housing. 



of the signal into six basic signal sources, or hypotheses: 
Null State (HO): y(t) ~ N(ji , V ) 

New Mean (HI): y(t) ~ N(ji, V ) 

New Noise Level (H2): y(t) ~ N(n , Vi) 
New Mean and 

Noise Level (H3): y(t) ~ 7V(a«i> Vi) 

Impact Signal (H4): y(t) = Y,k=i v kSk{t) + A* + e W 

e(i)~W(0,K) 
Grasping Signal (H5): ?/(t) = ^J =1 a^ - j) + e(t) 

e(i)~W(0,K). 
This set of source can be viewed as asking a set of 
questions about the statistics and spectral properties of 
the strain signal: 

1. Is the signal white or is there significant correlation? 

2. Has the mean of the driving noise changed? 

3. Has the variance of the driving noise changed? 

4. Is the mean and/or variance from the base case of 
no contact? 

5. Are there any impact signatures in the time series? 

A segmentation and identification procedure was de- 
rived based on these parametric models. The procedure 
is based on the generalized likelihood ratio test coupled 
with the minimum description length principle. This ap- 
proach offers a number of advantages over more adhoc 
procedures. First, its model based. The decision proce- 
dure follows directly from the models described above. 
Second, it is an optimal procedure, within the context of 
the models, when such a procedure exists. Lastly, it ex- 
plicitly estimates the time of events, which is a property 
that most filter based approaches lack. 

The next section 5 presents the theoretic basis for 
our decision algorithm. It begins with a slightly more 
abstract form of the problem of statistical segmenta- 
tion and identification in order to frame the discussion. 
The implicit assumptions about the measurement pro- 
cess used in the generalized likelihood ratio are then jus- 
tified. The test is then presented with a sequence of 
examples each of which is interesting in its own. For 
example, the procedure for testing for a change in mean 
from a known value is the optimal guarded move sensing 
strategy. 

Finally section 5.4 discusses the problem of labeling 
multiple parameterized models which is the problem pre- 
sented by our set of hypotheses. In this case, the problem 
of uniformly penalizing the free parameters arises. We 
discuss the two basic approaches that have been used in 
the literature and present a justification for the choice 
of the minimum description length principle (MDL). Fi- 
nally, the theory section concludes by presenting the al- 
gorithm in detailed form including a discussion of effi- 
cient parameter estimation algorithms. Results are pre- 
sented in section 6. We applied the algorithm to both the 
training examples discussed above and on more general 
tasks. 

5 Sequential Hypothesis Testing 

In order to develop algorithms for processing dynamic 
contact information, we introduce a sequential hypoth- 
esis testing model of the sensing process. This area has , 



been an active area of research in statistics and signal 
processing since its initial development by Wald [28]. A 
mathematical review is given by Siegmund [25]. There 
have been a number of important results during the last 
decade [3, 29]. These methods are relevant to any sig- 
nal processing task which can be modeled as a stochas- 
tic measurement process on an underlying system which 
undergoes discontinuous changes. The methods are par- 
ticularly useful when accurate and rapid decisions about 
the time of change are required. This includes edge de- 
tection, continuous speech segmentation, and dynamic 
contact sensing. 

The most powerful hypothesis testing procedure, i.e. 
the one that uses the most prior information, is Bayesian 
decision theory on a semi- Markov chain. An approach 
using this model would estimate the probability of be- 
ing in every node in the chain and would consider every 
possible sequence of transitions on the graph to explain 
the sequence of measurements. This can be computa- 
tionally complex. In addition, this approach requires a 
prior probability distribution for the transition probabil- 
ities, the holding times, and any parameter values which 
are difficult to develop. An alternative procedure is the 
sequential likelihood ratio. 

In sequential hypothesis testing it is assumed that the 
time for the algorithm to detect a transition is short 
compared to the holding time before a second transi- 
tion. Therefore it is assumed: 1) that transitions can 
be detected by considering only the data, 2) that at any 
time only one hypothesis needs to be assumed to be true, 
and 3) only one transition from this hypothesis needs to 
be considered. These assumptions make the problem 
complexity at most linear in the number of samples. 

The next subsection discusses the sequential hypothe- 
sis testing approach in general. Then, a set of important 
special cases is presented. First, the simple hypothesis 
testing problem of testing between known distribution 
is presented. This problem arises often in practice and 
is the easiest to treat theoretically. As an example, we 
show how easily contacts can be detected optimally, and 
demonstrate the increase in performance over an opti- 
mally designed filter and threshold approach. Then the 
procedure for two known signals in Gaussian noise is 
presented. The form of the computations for this test 
appears in all the more complex algorithms. 

Next we consider changes between two parameterized 
distributions. As a special case we consider testing for 
an unknown change in mean. This problem is very im- 
portant. Many problems can be reduced to this problem 
by appropriate preprocessing. In addition, all estimation 
procedures give rise to an asymptotically local procedure 
which looks like a test for change in mean [4]. 

Finally, we consider the problem of multiple param- 
eterized distributions. This is the form of our dynamic 
sensing algorithm takes. The procedure requires a tech- 
nique for uniformly penalizing the number of free pa- 
rameters, and we adopt the minimum description length 
principle. This section concludes by presenting the algo- 
rithm in detail including a discussion of efficient param- 
eter and model order estimation. 



5.1 General Theory 

In general, the measurement could be caused by any one 
of a set of m hypotheses (states) % = {H 8 }. Each state 
provides a statistical description j>i(y(k), ..., 2/(0) of the 
measurement process. y(l) is the first sensor signal gen- 
erated from state i and y(k) is the last. We consider 
only discrete measurements. As a series of sensor mea- 
surements j/q = {2/(0), •••, y(n)} are taken, the problem is 
to determine the sequence of states Xq = {x(0), ...,x(n)} 
from which the measurements were taken. Since it is as- 
sumed that the time between events is sufficiently long 
for an algorithm to detect the transition, the algorithm 
can run forward in time, from the last detected event 
and state and look only for a single transition within the 
data under consideration. 

Initially the algorithm is given that hypothesis H p is 
true. Then, for every time r from to n the likelihood 1 
that the measurements were generated by H p from time 
to time r — 1 and then by a different state H q from time 
r to n is computed. This is compared to the assumption 
that all the data came from H p . Because of the indepen- 
dence assumption of state transitions and the measure- 
ment densities, the likelihood of hypothesis p followed by 
hypothesis q is 

L(P, <1, r, 2/o ) = P P (y(r - 1), ..., y(0))p q (y(n), ..., 2/0)), 

and the likelihood that all of the measurements came 
from state p is 

L(p,y%)=p p (y(n),...,y(0)). 

The optimal test is the likelihood ratio for the measure- 
ments 

L(P, <1, r, 2/o ) _ P P (y(r ~ 1), •••, y(0))p q (y(n), ..., y(r)) 



L{p,Vo) 



P P (y(n),---,y(0)) 



The decision function is the maximum of the log of this 
ratio over the possible new states 

DF(p,q,y )= max log —- — - . 

re[o,n] L(p,y%) 

The most likely new state is q which equals 

q = arg meixDF(p, q, y%). 
i 



This yields the test 



H 4 
DF(p,q,y n ) > T 2 . 



This rule says that H g - will be chosen as the new state if 
DF(p,q,yQ) becomes larger than T 2 otherwise H p will 
be maintained as the current hypothesis. T 2 is the de- 
cision threshold and is a design parameter that controls 
the essential trade-off between the two types of errors. 



J The likelihood is the conditional probability of receiv- 
ing the measurements given the hypothesis. It is related to 
the probability that the hypothesis is true through Bayes 
rule which required a prior distribution. The likelihood is 
used when the prior distributions are either unavailable, or 
assumed to be noninformative. 



There are two important characteristics of this test: 
1) the false alarm rate, 2) the delay to detection. The 
earliest time at which the decision function exceeds the 
threshold, given that the system is still in state p, is 
the false alarm time tf(p) = inf(n : DF(p, 5,2/0) > T 2 ) 
which has distribution Pr^(n,p). The probability of 
no alarm at time n is Pr na{p-,p) = 1 — Pr^(n,p). 
The asymptotic false alarm rate is defined to be f(p) = 

1 — limj,_ >00 p T TN fj c _ 1 \ ■ This reflects the rate at which 
false alarms will occur over the long-term. In contrast, 
the delay to detection is a transient performance mea- 
sure. The delay to detection, given that a change to state 
q occurred at time 0, and the decision function is on state 
q, is t D (p,q) = inf(n : DF(p,q,y%) > T$\x(t) = E q ). 
The distribution of to(p,q) is Yiu{n,p, q) and its ex- 
pected value is io(p,q) = X2t=o ^ ^' I D(i,P, ?)• Both 
statistics are controlled by T p which is a design param- 
eter. Increasing T p decreases the false alarm rate and 
increases the time to detection. Determing both of these 
relationships requires solving a first passage problem. 
Closed form solutions to this type of problem are rare 
and difficult to derive. However, for some of the spe- 
cial cases considered below approximations can be de- 
termined and are presented. 

5.2 Changes Between Two Known Distributions 

The simplest and most important special case is detect- 
ing changes between two known, conditionally indepen- 
dent, probability distributions for a signal. This prob- 
lem contains all of the essential features of the statistical 
decision procedure. More generally, many sequential de- 
cision problems can be treated by designing a prefilter 
which changes the problem into this binary testing prob- 
lem. 

Assume that y(t) is an independent sequence with dis- 
tribution po(y(t)) under hypothesis and J>i(y(t)) under 
hypothesis 1. Further, assume that Ho is the initial hy- 
pothesis. Because y(i) is independent, the probability 
density of receiving a sequence of measurements y^ un- 
der either hypothesis, conditioned on the value of H 8 - is: 



p(j/iiH,o=np,-(i/«) 



t = k 



The likelihood ratio between the two hypotheses is 

£(0,i,^) = p( ^ |H ± ( f" |Hl) (3) 



n 



POtflHo) 

Pi(i/(*)) 
Po(!/W) ' 



(4) 



To simplify the calculations let jo(t) = l°g(Po(j/W))' 
7i(<) = log( Pl (j/(<))), Hoi(<) = 71 (*) " 7o(t), and 
Sl(0, 1) = Yjt=k ^oi(^)- Then the decision function for 
a change from state to state 1 is 

DF(0,l,jtf)= maxlog£(0,l,r,itf) 

r£[0,nj 

which results in the binary rule 



Hi 

DF(0,l,yS) < T 2 . 

Ho 



(5) 



Simulated change in mean at 125 




ROC for Page-Hinkley Test 



Page-Hinkley Test 




Figure 11: Behavior of the Page-Hinkley stopping rule 
to a simulated change in mean at tick 126 for a Gaus- 
sian process. Signal has standard deviation of 1 before 
and after the change, and mean of 1.0 after the change. 
Change is detected with a threshold of 15 at tick 149. 
The estimate of the time of change is the last time the 
test equals zero which is at tick 128. 



Figure 12: Receiver operating characteristic (ROC) of 
Page-Hinkley test between two Gaussian distributions 
with different means and the same variance as a function 
of the signal to noise ratio s = — ^. The log 10 (t/) is 
shown as a function of the mean time to detection id for 
s =0.5, 1.0, 1.5, and 2.0. 



This is equivalent to the Page-Hinkley (PH) cumulative 
sum stopping test 



equivalent false alarm rates. The asymptotic false alarm 



DF(0,l,yZ) 



S?(0,1)- mm Sg(0,l). 

0<j <n 



This test minimizes the time taken to reach decision Hi 
over all tests that have the same false alarm rate [25]. 
Further, it is easily computed recursively by 

DF(0, 1, y n ) = max(0, DF(0, 1, y^ 1 ) + E 01 (n)). 

5.2.1 Change in Mean in Gaussian Noise 

An application of the PH test to a change in mean 
in Gaussian noise is shown in figure 11. This figure was 
generated by adding a mean of 1.0 to a zero mean Gaus- 
sian random sequence with variance 1. With a change 
threshold of 15, the test detects the change at tick 149 
for a delay of 24 ticks. The change time is estimated to 
be 128. In this particular case 

-oi(n) = y (y(n) g ) 

where V is the variance of the signal, and ^ are the 
means under the two hypotheses. DF is simply the cu- 
mulative sum or integration, of Soi with resetting at 0. 
The computation has the same computational cost as 
the alternative of lowpass filtering the signal and thresh- 
olding the result at the optimal value j2. 

5.2.2 Comparison to the filtering approach 

The performance of the PH test can be compared to 
lowpass filtering followed by thresholding on tests with 



rate 



PH 



f and time to detection to for the Page- 
Hinkley test can be approximated by applying Wald's 
identity and approximations [25]. The results are 



PH 



PH 



t F 
i D 



where 



A- 



lo. 





T2 -T 2 -l|//3 
-t 2 +T 2_ 1)//?i 
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fPi(01 

Lpo(oJ 


p,-(CK- 



Since the false alarms are the interarrival times of a 
Bernoulli process they are geometrically distributed. 
Therefore the asymptotic false alarm rate is 

PH * _ 1 
J ~ PH i F ' 

For the change in mean between two Gaussian processes 
with the same standard deviations a, fii is 

A = 1/2 (^ 2 

A plot of the trade-off between the time to detection, 
id, and the time to false alarm,tj is called the receiver 
operating characteristic (ROC). It is a function of the 
signal-to-noise ratio s = — -. Graph 12 shows the value 
of id and log 10 if parameterized by T for a fixed value 
of s. The ROC for this test is shown in figure 12 for 
s = 0.5, 1.0, 1.5, 2.0. Both the mean time to a false alarm 
and decision increase with increasing threshold. At a 
fixed false alarm time, an increase in the signal to noise 
ratio will decrease the time to detection. 

The performance of the alternative test of lowpass fil- 
tering followed by thresholding can be bounded using the 



following asymptotic approximation derived by Hall [10]. 
The approximations are valid in the limit of an increas- 
ing threshold and short sampling time. Consider a filter 
realized by a stable, linear, time invariant vector process 
x 

x(k + 1) = Ax(k) + w(k + 1) + A//u_i(fc - r) 

driven by a white, zero-mean, Gaussian noise w(k) with 
noise intensity Q. A change of size A/j, is applied by the 
unit step m_i at time r. The covariance of x satisfies 
the discrete Lyapunov equation S = ASA T + Q and the 
decision function is DF(k) = x T (k)S~ l x(k). In princi- 
ple it is possible to determine Yifa{^) by propagating 
the density for x(k), p(x,k), forward in time and then 
integrating over the decision region. The propagation 
equation is 



p(x,k+ 1) 



Pw (x-A()p((,k)d( 



where D 

by 



{x : DF(k) <T 2 }. Then Pr FA (fc) is given 



Pr(fc) 

FA 



1 



p(«, k)du. 



Unfortunately there are no closed form solutions to this 
problem. However by treating the discrete system as a 
sampling of a continuous system, an approximation valid 
for large k can be determined. Using this approximation, 
the steady state false alarm rate / is found to be asymp- 
totically bounded by 



{ ln(det( A d ))TP „ 

^ 1 - ex P( v r(p/2 + i) exp 



TV2 (l _ p/T 2 ) 



where p is the dimension of x. In the case of a first-order 
lag filter x(k + 1) = ax(k) + w(k), the bound is 

T 2 /2/ 



/o < 1-exp V 7r /21n(a)Texp- J /2 (1- l/T 2 ) 



This is the bound for x 2 / S > T. The PH test is equiva- 
lent to X/ S 1 ' 2 > T which has a false-alarm rate bounded 
by/o/2. 

To approximate Pio(k) note that DF(k) is a 
noncentral chi-squared random variable with p de- 
grees of freedom and noncentrality parameter 8 2 (k) = 
x 1 ' (k)S~ l x(k) [2]. The process mean x satisfies 

x(k + 1) = A d x(k) + Afi 

with initial condition x(0) = for a change in mean 
of A/j,, where we have assumed for simplicity r = 0. 
If the cumulative noncentral Chi-square distribution of 
DF at value T 2 is denoted by F(T 2 , S 2 ,p), then Pi D (k) 
is bounded by 



Pr(fc) > 1 

D ' ~ 



F(T 2 ,8 2 ,p) 



which can be computed numerically or approximated. 

For a scalar, first-order lag-filter, the ROC can be 
computed as a function of the the signal-to-noise ratio 
s as in the PH test. In this case, the values of id and 
log w if are parameterized by a. The optimal threshold 



for the test is 



4S 



where S 



ill 



(l + a) 



^a 2 . 



This gives a 



ROC for Lowpass Filter Test 




Figure 13: Receiver operating characteristic (ROC) of 
first order lag filter test with threshold between two 
Gaussian distributions with different means and the 
same variance as a function of the signal to noise ra- 
tio s = — -. The log w (if ) is shown as a function of the 
mean time to detection id- 



threshold of T 



(i) 2 il^L 

\2> (l + a)' 
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With the one-sided test, 

an approximation for Viu{k) is simply the probability of 
drawing a value greater that A///2 from a Gaussian ran- 
dom sample which has mean x(k) and variance S, given 
that the test has not already terminated. The probabil- 
ity of terminating at time k given that the test has not 
already terminated is 



F(k) 



The probability of terminating at time k is then given 
by the recursion 

Pd(0) = F(0) 

P D (k) = F(k)(l-P D (k-l)). 

This gives an underestimate of the termination time. An 
overestimate is given by the rise time for x(k) to A///2. 
Figure 13 shows the logarithm of if as a function of 
id for a signal-to-noise ratio ofs = 0.5, 1.0, 1.5, and 2 
computed using these two approximations. The curve for 
s = 0.5 has been cut short, because the approximation 
is not valid for small id- 
Figure 14 indicates that the lowpass filter approach 
has a longer delay to detection compared to the PH test 
when they have the same false alarm rate. The test 
shown in figure 11 will signal an alarm on average every 
6 x 10 6 samples and the change will be detected after 28 
samples. To get equivalent performance from the low- 
pass filter, a must equal 0.98. With this value, the esti- 
mate of io is 29.5 and the rise time is 34.5. These results 
demonstrate that the PH test gives an improvement in 
performance without an increase in computational cost. 
In addition, an estimate of the change time is possible 
with a small amount of additional storage. 



Simulated change in mean at 128 




Lowpass Threshold Test 




Figure 14: Lowpass filter of x(n + 1) = 0.98x(n) + 
0.02j/(n+l) on the same signal as figure 4. The threshold 
is 0.50. The change is detected at 153. This is a slower 
response then the response for the PH test. Further, an 
estimate of the change time is not computed. 



5.2.3 Change between Deterministic Signals in 
Additive Gaussian Noise 

The sequential hypothesis approach can be extended 
to derive tests for characteristics that are detectable with 
a single linear filter. A direct extension to the binary 
problem is given by the hypotheses: 



Ho 
Hi 



y(t) 



8 (t)- 

Sl (t)- 



v (t) 

V1 (t) 



t = 0, 



where s; are known deterministic signals. A test equiv- 
alent to the binary case results with ji(t) redefined as 

ji(t,r) = log(p vi (y(t) - Si (t-r))) 

where p^ i are the known probability densities for the 
noise process. In this case, DF(0, 1, yfi) cannot be com- 
puted recursively. Instead, the maximization must be 
computed by exhaustive search over a growing window 
of length n. To limit the increase in computational com- 
plexity, a suboptimal approach is usually taken where 
the search window is constrained to a moving window of 
fixed length. 

Because of the time invariance of S{ , the complexity 
of the calculation of ^"(O,!) is order Lm where L is 
the number of points in the window and m is the cost 
of computing Soi(n,r) for a single point. To see this, 
suppose that ^"(O, 1) has been computed over the win- 
dow and stored in a vector Soi(n) of size L. When data 
point y(n + 1) arrives Soi(£, r) must be computed for 
every r and t in the window and then summed. How- 
ever, the sum of Soi(£, r) for r and t less than n + 1 
has already been computed and stored in Soi- Thus 
by shifting Soi(n) by one and then adding Soi(n + 1, r) 
to the shifted result we obtain S(n +1). This requires 
order Lm operations. With a vector processor with L 
elements, the entire calculation can be done in parallel 



in m operations. Soi(n + 1) must then be searched for 
the maximum element. This structure for the calcula- 
tions is preserved in the more general case of an unknown 
change magnitude in Gaussian noise, which is considered 
in the next section. 

5.3 Unknown Parameterized Distributions 

In most applications the magnitude of the change is un- 
known. For this problem, there are two probability mod- 
els for y(t) 

Ho: y(t) p o (l/(*).0o) 

Hi: y(t) Pl (!/(*), 0i ) 

where 0] is a known parameter vector and 6\ is unknown. 
0] is an element of ?R q ° , 6\ is element of ?R qi , and y 
is an element of ?R P . The probability densities p and 
p 1 may or may not be from the same family. If the 
dimension and interpretation of 0] and 6\ are the same, 
one approach for detecting the change is to choose a 
minimum change value 86 > a priori and then run 
two Page-Hinkley tests in parallel with 6\ = 0] ± 86. 
The test which terminates first is used as the decision 
rule. Changes of parameter greater than 86 will trigger 
a decision, but the delay to decision will be longer than 
with the correct 86. 

One approach to choosing a value for 86 is to select a 
distribution for 6\ a priori . In this case the measurement 
distribution, under Hi, is the convolution 

Pi(l/(i)|Hi) = Pl (i/(t),0i)®p 9l . 

If 6\ is treated as a sequence of random variables drawn 
from \>0 , then the Page-Hinkley test can be applied as 
above. 

The problem of choosing a distribution for 6\ can be 
eliminated by maximizing C over both 6\ and r. This 
yields the generalized likelihood ratio (GLR) test [30] 



DF(0,l,y%) 



max max S" (0,1). 

0<r<n- qi B 1 



There is a delay of at least q\ samples before detection 
so that 6\ can be estimated. If 6q is also unknown the 
test includes one more maximization. 

5.3.1 Unknown Change in Mean 

As an example of this problem, suppose p s - are both 
Gaussian densities with 6q = (po, Vo) the known mean 
and variance before the change. For simplicity assume 
^0 = 0. After the change the process has a new mean 
pi but the same variance Vo- Then Soi is 

E 01 (p 1 ,k) = pJV ~ 1 y(k)- -pJV ~ 1 p 1 . 

The maximization over p\ yields the usual sample mean 
estimate for p\ from r to n 



pi(r,n) 



1 



(n — r + 1) ^-^ 

V ' k=r 



y(k), 



and the decision function be 



DF(0,l,yS) 



re[0 



max y2z i(pi(r,n),k) 

0,n — 1 * — * 



k=r 



li 



max 

r£[0,n-l] 



p 1 (r,n)V pi(r,n). 



Again, the maximization is limited to a moving window 
of size L since it must be solved by exhaustive search. 

Under hypothesis 0, DF is a central chi-squared 
random variable with p/2 degrees-of-freedom DF ~ 
X 2 (p/2) which has mean p/2. Therefore, the thresh- 
old T must be at least p/2. Since under hypothesis 1, 
DF is noncentral chi-squared with p freedoms and non- 
centrality parameter ■^/jJV^' (j,\, the detection time can 
be approximated by the results in section 5.2.2. 

The test computation can be computed in order Lm, 
or in order m steps by a vector processor of size L. At 
time n the value of jii(r, n) is stored for every point in 
the window in the array M(n). Each column of M is the 
estimate of \i\ involving the final j measurements. After 
receiving y(n + 1), all of the columns of M are shifted 
right by 1 and then each column j is updated by 

M(j, n + 1) = M*(j, n) + (y(n + 1) - M*(j, n))/j 

where M* indicates the shift operation. The decision 
function becomes 

DF(0, 1, j£) = max J -M(j, nfV^Mij, n) 

1<] <L Z 

with r = n - j max + 1. 

5.4 Multiple Parameterized Models 

In the most general case each hypothesis H p in the col- 
lection % could be parameterized by a different number 
of free parameters 6 p . This is the form that our dynamic 
sensing algorithm takes. The decision function is still 
the likelihood ratio but now with a maximization over 
three different parameter sets 



DF(p, q, j/q ) = maxminmaxlog 



L(p,q,r,6' p ,6 q ) 
L(p, r, 6 p ) 



where 6' is the value of the parameters for H p for the 
data from to r — 1. Again, the maximization over r 
is explicit and so only a maximum number of moving 
windows are kept. 

Two additional issues arise in this problem. First, 
models with more free parameters have greater explana- 
tory power. That is they are able to fit finite length 
noise signals better than models with fewer free param- 
eters. Therefore, a method of penalizing the number of 
free parameters which corrects this problem is required. 
The next subsection discusses this issue. 

Second, the autoregressive model hypothesis, hypoth- 
esis 5, does not use a fixed number of parameters, and 
the parameter estimates depend upon the order of the 
model. Therefore, an efficient computationally technique 
for estimating the model parameters for all model orders 
is needed. This is provided by estimation of the reflec- 
tion coefficients which is discussed in section 5.4.2. 

Finally, section 5.4.3 gives the detailed procedure for 
our algorithm. This includes initialization, choice of 
weighting for the number of parameters, the estimation 
procedures, and the decision procedure. 

5.4.1 Model Penalty 

A problem arises with the unadjusted likelihood ra- 
tio test when the test involves models with different 
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numbers of free parameters. Models with more free 
parameters, or degrees-of-freedom, have more explana- 
tory power. That is they are able to fit finite lengths of 
data better than models with fewer degrees-of-freedom. 
Therefore, the model with the greatest number of param- 
eters will always be the most likely. In order to correct 
this problem, a uniform method of penalizing the extra 
freedoms is required. Note that even if all the models 
had the same number of freedoms, the change hypoth- 
esis has twice as many freedoms as the recursive model 
because because it uses two models. 

This problem is ubiquitous to likelihood approaches 
which involved comparisons between models with differ- 
ent numbers of freedoms. This problem is eliminated 
in the Bayesian approach by the a priori choice of prior 
probabilities for the number of models and parameters. 
The log of these prior probabilities adds to the likelihood 
calculation and essential creates an a priori model cost. 
What the likelihood approach requires is a principle for 
choosing these model costs. 

The minimum description length principle [22, 21], or 
MDL, is one of the few general principles for choosing 
the model cost. This principle states that the cost of the 
model is related to the number of parameters or bits it 
takes to encode the model. In general, simpler expla- 
nations are preferred, therefore the likelihood should be 
reduced by the complexity of coding the model. For a 
number of linear parameter estimation problems, Rissa- 
nen has shown that the model cost, defined in terms of 
description length, is asymptotically - z log n where k is 
the number of parameters and n is the number of sam- 
ples used in the estimation. 

An equivalent result was derived by Schwarz [23]. 
Schwarz derived an asymptotic expansion to the optimal 
Bayes procedure by assuming a fixed error cost and fairly 
general conditions for the prior distribution and the sam- 
ple distributions. Under his assumption the Bayes pro- 
cedure chooses the a posteriori most probable model and 
parameter values. Asymptotically this is equivalent to 
maximizing the log-likelihood minus If log n. For auto- 
regressive (AR) processes this procedure was shown to 
be strongly consistent (asymptotically) by Hemerly and 
Davis [11]. 

No such result exists for the still popular, and useful, 
alternative AIC procedure derived by Akaike [1]. Akaike 
suggested that the decision criterion should be based 
on maximizing the expected value of the log-likelihood. 
When all the models use the same underlying probabil- 
ity distribution, the penalty becomes the difference in 
expected log-likelihood between two competing models 
when the actual signal is white noise. For parameter esti- 
mation problems, this gives a penalty equal to the num- 
ber of free parameters. For AR processes, Shibata [24] 
derived the distribution of the number of free parame- 
ters estimate, k, using this penalty. The result shows 
that the most probable k equals k but that the expected 
value of k is generally closer to k + 1. 

All of these results are asymptotic and so for small 
sample sizes, with which all our tests work, there is some 
freedom in choosing the penalty. Because of the stronger 
theoretical justification of the MDL criterion, our algo- 



rithms utilizes a penalty of the form b^log(n) where b 
is a unit parameter cost. In addition, since our proce- 
dures use data sizes of between 20 and 40 samples for the 
moving window the MDL and AIC criterion give nearly 
equivalent penalties. Values of b larger than one favor 
simpler hypotheses more heavily than the MDL crite- 
rion. We obtained subjectively good results with b from 
2.2 to 2.7. 

5.4.2 Linear Models and Orthogonal 
Estimation 

This section discusses the efficient computation of the 
order and parameter estimates for linear predictor mod- 
els in Gaussian white noise. All of the models used in our 
dynamic sensing algorithm fit the framework discussed 
here. 

The general form of a linear predictor model is: 

y(n) = ip (n)6 + e(n) 

where ip T (n) is a vector of regression coefficients, 6 is 
the parameter vector, and e is a white Gaussian noise 
process. For example, for the new mean model HI 
if 1 ' (n) = 1. For the AR model with m free coefficients 
ip T (n) = [y(n - 1), ..., y(n - m)]. 

For the linear predictor model the maximum likeli- 
hood estimate of the parameters is the least-squares es- 
timate. The least-squares estimate can be written in 
matrix form by collecting the measurements into a vec- 
tor as Y(n) = y" , and collecting each element of ip T 
into a vector as s 8 (n) = ip(i)". Then the least-squares 
estimate takes the form 



Tij(n) 


= {si(n)\ Sj (n)) 


X(n) 


n 


Xi(n) 


= (*i(n)\Y(n)) 


X(n) 


t=i 


9(n) 


= 2- 1 (n)^'(n), 



where (|) denotes inner product. X is called the empirical 
information matrix and X is called the empirical infor- 
mation vector. 9(n) is the parameter estimate at time 
n. 

The parameter estimates depend upon the model or- 
der because in general the vectors {.Sj (n)} are not orthog- 
onal. An efficient estimation procedure for the model 
order can be performed by first orthogonalizing the vec- 
tors {sj(n)} and then reconstructing the estimate for 
any desired order. The orthogonal parameter estimates 
are called the the reflection coefficients because they 
are related to the amount of energy reflected back at 
a changing impedance junction in an electrical transmis- 
sion line [19]. 

When the computation is performed on-line only X(n) 
and X(n) are actual kept in memory. Therefore, the 
reflection coefficients need to be computed from these 
matrices. If the vectors {sj(n)} were actually available, 
a Gram-Schmidt procedure could be applied to the vec- 
tors to generate an orthogonal basis. This decomposition 



represents the matrix S = [s{] as S = QD 1 ' 2 R where Q 
is the orthonormal basis, D 1 ' 2 is the diagonal matrix of 
lengths for the orthogonal vectors, and R is the corre- 
lation structure of S. The time dependence has been 
dropped for clarity. Then, the reflection coefficients, K, 
satisfy D X I 2 K = Q T Y. 

Now note that X = 5 T S = R T DR, since Q is or- 
thonormal, and X = S T Y = R T D l l 2 Q T Y . Therefore 
the reflection coefficients are generated by the first stage 
of Gaussian elimination. Gaussian elimination factors X 
into R T DR. Therefore, the reflection coefficients are the 
solution to R T DK = X . This solution is just the result 
of the first stage of Gaussian elimination. The second 
step, back substitution, solves R9 = K. 

Now the reflection coefficients can be used to recon- 
struct the solution for any model order. Given any order 
model m, let the first m reflection coefficients be K m 
and the corresponding submatrix of R be R m . Then the 
original model coefficients for an order m model can be 
determined from K m = R m m . 

More importantly, the reflection coefficients can be 
used to immediately determine the optimal model order. 
Let Eq = (^1^) be the original signal energy. Then be- 
cause of orthogonality, the energy remaining after using 
the m th reflection coefficients is E m = E m -i—k 2 /D mrn , 
and the adjusted log-likelihood of this model is l(Y, m) = 
— 0.5n(log(£' m /n) + 1) — b m ~I ' log(n). The model order 
which maximizes l(Y, m) is optimal under this criterion. 

Although the computations can be reorganized for 
better numerical performance, this is the essence of the 
ladder or lattice algorithm. In summary the estimation 
procedure stores X, X , and the energy in the input signal 
Eq and then performs the following update procedure: 

1. Update the regression vector ip based on the model 
equation. 

2. Update E . 
E + y(n) T y(n) 



E n 



3. Update the information vector and matrix. 

X <- X + i?i) 
X +- X + ip T y(n) 

4. Decompose X into X = R T DR. 

5. Solve for the reflection coefficients. 

R T DK = X 

6. Determine the optimal model order m* by maxi- 
mizing the adjusted log-likelihood l(rn). 



l(m) 



-0.5n(log(£ m /n) + 1) - b [ ^ ' log(n) 
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7. Solve for the optimal model parameters. 

6 m * <— R~ 1 K m * 

When the number of model parameters is fixed, the fast- 
gain algorithm can be used to update the parameter es- 
timates with a fewer number of computations [18] but 
the essentials are the same. 
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Figure 15: Generalized Likelihood segmentation proce- 
dure with multiple hypotheses. A single recursive model 
and a collection of moving models must be updated for 
every new measurement. 



5.4.3 Computational Organization 

The sequential nature of the test leads to an organiza- 
tion of the computations that minimizes the number of 
computations. A conceptual picture of the computations 
is shown in figure 15. At any stage in the computation 
there is one hypothesis being grown recursively and a 
collection of L moving window models for every alterna- 
tive hypothesis. A minimum length L m i„ is chosen for 
the moving windows based on estimation requirements. 

Since all of the hypotheses have linear parameters in 
Gaussian noise, the least squares estimate of the param- 
eters is the maximum likelihood estimate. Therefore, 
for every window w, both recursive and moving, and hy- 
pothesis p we store D p (w) which contains: 1) the current 
value of the empirical information matrix X p (w), 2) the 
current value of the empirical information vector X p (w), 

3) the current value of the parameters 9 p (w), and 4) the 
current value of the adjusted log-likelihood l p (w). 

All of the moving windows for each hypothesis are col- 
lected into a structure M p (n) = {D p (l), ..., D p (lmax)}. 
For window lengths less than Imin, the parameters are 
not estimated. When a new data point y(n + 1) is mea- 
sured the moving windows are updated by: 

1. The values in M p are shifted by 1 for every hypoth- 
esis. 

2. X p (w), X p (w), 6p{w), and l p (w) are updated for ev- 
ery window using the algorithm described in sec- 
tion 5.4.2. 

Let the current hypothesis by H*. For this cur- 
rent hypothesis a collection of recursive windows 1Z = 
{D*(l), ..., D*(lmax)} is stored. In this case each D(i) 
structure contains the information vector and matrix 
based on data from to n — i+1. When a new data point 
y(n + 1) is measured the recursive windows are updated 
by: 

1. The values in 7Z are shifted by 1. 

2. The values in D* (1) are updated using the algo- 
rithm described in section 5.4.2 except now the 
model order is fixed. 

After updating all the moving windows and the re- 
cursive window, the algorithm examines the results to 
detect a change. The likelihood of the current recur- 
sive model £)*(!) plus the threshold X? is compared to 
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the sum of the likelihood for all possible pairs of moving 
models with the matching recursive model. If the change 
likelihood is greater a change is detected. 

Several steps are necessary to update the computation 
after detecting a change. First the change time is taken 
as the new time origin. Then, the number of moving 
models is adjusted so that only models that lie entirely 
after the detected change time are kept. Finally, recur- 
sive models starting at time r, with lengths from L m i„ 
to n — r + 1 must be recomputed in a batch mode. The 
computation then proceeds from this adjusted state. 

To initialize the computation, each hypothesis is eval- 
uated on a minimum length window L m i„ in a batch 
mode. The optimal hypothesis is computed by maximiz- 
ing the likelihoods after adjusting for a model cost. The 
maximal hypothesis is then stored as a single recursive 
model. 

This computation produces at every time t, lagged by 
L max = L m i n + L — 1, a hypothesis model, its associated 
parameter vector, an estimate of y, and the model's ad- 
justed likelihood. In addition, a sequence of event marks 
are produced as each change in model is detected. If the 
hypothesis models are chosen well, the output of the al- 
gorithm is essential a sequence of symbols for the data 
which can be used for further task recognition process- 
ing. 

6 Experiments and Results 

The algorithms developed in the previous section were 
applied to segmenting the contact signal hypotheses de- 
veloped in section 4 on both the selected data samples 
and more general examples. In addition, the algorithms 
were simplified and specialized to particular problems 
to show their utility in detecting important but minute 
changes in real-time. The results should be evaluated in 
terms of three criteria: 1) do the algorithms divide the 
signal into segments that make sense, 2) are the hypothe- 
ses identified by this context free algorithm the ones we 
expect, and 3) could the results be interpreted given a 
context. The results show that the algorithms do a good 
job of segmenting the signal, but that a context free 
interpretation is very difficult. However, by fixing the 
context for simple cases, robust interpretations should 
be possible. 

6.1 Performance on design set 

The general multi-model generalized likelihood ratio test 
was applied to the set of data used for developing the 
model. There were six basic models 

Null State (HO): y(t) ~ N(ji , V ) 

New Mean (HI): y(t) ~ N(jj, V ) 

New Noise Level (H2): y(t) ~ A(jU , Vi) 

New Mean and 

Noise Level (H3): y(t) ~ N(m, Vi) 

Impact Signal (H4): y(t) = J2k=i v kSk{t) + A* + e W 

e(t)~W(0,Vi) 
Grasping Signal (H5): y(t) = Yfj=i ajVi 1 ~ J) + e W 

e(t) ~ N(0,V!). 

For each model, moving windows with 15 to 40 sam- 
ples, inclusively, were used. The number of free parame- 
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Figure 16: Impact Signal incorrectly marked at tick 200. 



Figure 17: Impact Signal correctly marked. 



ters was penalized using the MDL criterion with b = 2.2. 
In addition, impacts cause such large changes in variance 
that changes were signaled when the first data point of 
an impact entered a moving window. Therefore, a per- 
sistence test of 20 was necessary to get the correct mark- 
ing of the change time. The length of minimum window 
was chosen based on the length required for estimation, 
while the maximum was chosen based on the size of the 
persistence test. The decision threshold was set at 12. 

6.2 Performance on task examples 
6.2.1 Impacts 

The training set for impacts was used to get a mea- 
surement of performance. The classification for 72 ex- 
amples was: 
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The results show that impact hypotheses (H4) and the 
change in variance hypothesis (HI) are difficult to distin- 
guish and suggests the need to consider context. Selec- 
tion of H4 depends upon this signal having some consis- 
tency from sample to sample. This is only occasionally 
the case, and so the simpler explanation HI is frequently 
chosen. In real world impact situations, this problem re- 
sulted in confusion of the impact model (H4) with the 
change in new mean and noise model (H3) and the grasp- 
ing signal model (H5). In the case of H5, a set of high 
frequency poles was estimated. 

However, the introduction of context should make dis- 
tinguishing the hypotheses much easier. An example of a 
typical unsuccessful marking is shown in figure 16. The 
top plot shows the signal, the next plot shows the se- 
quence of hypotheses and the transition marks, the third 
plot shows the variance estimate, and the bottom plot 



shows the log of the loglikelihood for the top two hy- 
potheses. Estimates of the coefficients for each of the 
models is also produced by the procedure, but is not 
shown. The figure shows than in unsuccessful marking 
the variance estimate has a sudden increase followed by 
a sudden decrease. It also shows that the likelihood indi- 
cates an event very clearly, at the impact time, and that 
the two top hypotheses, which includes the impact hy- 
pothesis, are very close during the impact. It should be 
possible to use this additional information to add con- 
text and get more accurate markings. Finally, the resid- 
ual vibration after the impact is matched to hypothesis 
3 because of the change in mean caused by the hammer 
resting on the sensor. A successful marking is shown in 
figure 17. The impact signal is modeled as H4 and the 
residual after the impact is modeled with H5. Again, HI 
and H4 are close during the impact event. 

The transition marks are generally close to the actual 
transition. The transition time error for the training set 
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The transition times are within three 87% of the time. 
Errors larger than 3 were caused by small transitions in 
the signal that occurred just before the impact. In this 
case, the persistence test prevents a second change at 
the impact location and thus causes a large error. 

The performance on the training examples indicate 
that impacts are very difficult to detect in a context free 
form. They are easily confused with jumps in the driv- 
ing variance for the other models. This is confirmed 
in a more general impact experiments. To get a sense 
of performance for general manipulation, the fingertip 
was held by hand and then struck against an aluminum 
block. One hundred and three examples were taken, and 



the impact point was selected by looking for the first out- 
lier that was 20 or more standard deviations away from 
the initial signal. The initial part of each of these sig- 
nals is much more variable and the impacts are usually 
followed by a large change in mean from the sustained 
contact. For these examples the classification was: 
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This clearly shows that impacts are confused with H3 
and HI. However, the segmentation times are still very 
good with 93% of the times falling within 3 of the correct 
time. 
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Even in the case of an error, the variance shows a char- 
acteristic increase and then decrease over time. This can 
be used to improve labeling at a context level. 

6.2.2 Changes in texture 

Several basic change in texture experiments were per- 
formed. The first used the slip apparatus to move the 
sensor over the aluminum bottom and then over a piece 
of 180 grit sandpaper. An example segmentation from 
this experiment is shown in figure 18. In all of these ex- 
periments, the high vibration section was matched with 
H5. The moment of impact with the sandpaper is gener- 
ally indicated by a sudden rise in variance. In this par- 
ticular plot, the contact occurs at tick 1240. It was not 
marked because the likelihood did not pass the persis- 
tence test, however the change in variance was detected. 
In all the plots at least two levels of vibration, as indi- 
cated by the variance estimate, are always marked: one 
for sliding on the aluminum, and one for the sandpa- 
per. In figure 18 this can be clearly seen in the variance 
estimate at tick 500 and 1500. 

The importance of this result, and the other texture 
experiments is that substantial (over 500 samples) sec- 
tions of each signal are marked as statistically similar. 
This is a huge reduction in the size of the input to a 
higher level recognizer. The algorithms reduce large 
pieces of the raw signal into a single variance and set 
of estimation parameters. 

This can be clearly seen in two additional texture ex- 
periments. In each of these experiments the sensor was 
moved by hand in order to get smooth, low vibration, 
motion. In the first experiment, the fingertip sensor was 
pulled over a smooth plastic surface and then over 4 
sheets of ordinary paper (figure 19). In the second, the 
fingertip was pulled across a calibrated surface roughness 
made with a lathe (figure 20). 

In figure 19 the transition to sliding over the paper 
is marked at tick 200. The algorithm detects both a 
change in the variance, which is caused by the difference 
in roughness between the two surfaces, and a change in 
the spectrum of the signal. In figure 20 the repetitive 
pattern of the lathed surface texture can be seen in the 
signal. The algorithm models this with the autoregres- 
sive coefficients. The algorithm detects the jump in the 
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Figure 18: Example of change in texture: Sliding over 
an aluminum surface followed by sliding over 180 grit 
sandpaper. 
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Figure 19: Example of change in texture: Section of 
sliding over a plastic table-top followed by four sheets 
of paper. The mark is placed at the point of change 
between the two surfaces. In addition to a change in 
variance, the algorithm detects a change in the spectral 
content of the signal as indicated by the AR coefficients. 
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Figure 20: Example of change in texture: Motion across 
a calibration surface roughness calibration surface. The 
fingertip begins sliding over the surface at approximately 
tick 180. The surface was made with a lathe and has half- 
sinusoidal features of height 0.002 in and width 0.030 in. 



spectrum when the fingertip begins to move across the 
textured surface. If the velocity of the sensor could be 
controlled, or modelled, the autoregressive coefficients 
could be potential used for identifying or sorting tex- 
tures. 

6.3 Summary 

The results show that the algorithm tests for char- 
acteristics that changed when the contact conditions 
changed. The technique produced generally accurate es- 
timates of the change times, and was sensitive to small 
variations in the contact conditions. Our experiments 
showed that changes between small texture variations 
were detectable when the sensor was moved by hand. It 
was less sensitive under computer control, possibly be- 
cause of the additional robot vibration. The results show 
that the algorithm provides a method of discrimination 
between different contact conditions. 

Identification, or labeling, of the correct signal source 
was problematic. This paper discussed work aimed at 
characterizing the statistical source of the data, and then 
using that source as a label. The autoregressive hypoth- 
esis H5 was often selected as the most probable even in 
the developmental data sets. In more general manipu- 
lation, where some low frequency interaction force will 
always be present, this hypothesis will become even more 
probable. Therefore, future work will explore using only 
the AR signal model 

p 
y(t) = n + ^ a-iVi 1 - + e W 



as a basis for segmentation. This model encompasses g< 



almost all of the other models and therefore little seg- 
mentation power will be lost. 

In addition, this model produces segments by fitting 
a spectrum to the local data. Therefore, all of the data 
in a segment will have approximately the same spectral 
content. Preliminary work shows that it may be possible 
to use the AR coefficients for texture sorting. The same 
texture tended to produce an equivalent spectrum when 
the sensor was moved at the same velocity. Additional 
work in adjusting for sensor velocity and characterizing 
the performance of this technique remains to be done. 

In addition, a task level explanation of the results re- 
quires a task model or task context which was delib- 
erately not introduced into this study. Future work is 
aimed at examining this problem. A context can provide 
both a interpretation framework, and a guide for placing 
better segmentation boundaries. The framework may 
depend upon position measurements and should predict 
the expected signal parameters. 

7 Conclusion 



This paper examined the problem of temporal segmen- 
tation of manipulation force signals. Substantially more 
information, useful for manipulation, then contact/no 
contact is carried by the contact force signals. A better 
understanding of the signal characteristics and their rela- 
tionship to task models would provide a rich framework 
for controlling manipulation. 

The first problem in designing a system to interpret 
the force signals, is to segment the signal into statisti- 
cally equivalent classes. The sequential hypothesis test- 
ing method was introduced as a general tool for produc- 
ing segment boundaries, and was applied to the problem 
of segmenting the individual strains from a 6-axis force 
torque sensor. 

Experiments with isolated simple examples was used 
to develop a set of statistical source hypotheses for the 
data. These experiments resulted in the development 
of six basic statistical models. Impact signals were the 
most difficult signal to model and detect. A great many 
procedures and models were attempted before settling 
on a training data based model and the GLR test. 

The GLR test produced accurate segmentation 
boundaries and event time marks. In addition, the re- 
sults showed that an autoregressive model was often the 
most power hypothesis. We expect that this model will 
be sufficient, by itself, for producing good segmentations 
of the signal. 

In addition, the framework produced some simple 
tests which can be used in place of threshold procedures 
and which have better performance and theoretical jus- 
tification with no additional computational cost. An ap- 
propriate filter followed by the Page-Hinkley test can be 
used to test for many simple conditions. In some prelim- 
inary experiments, we were able to detect the onset of 
fingertip slip, by feeding the absolute value of a highpass 
signal into the Page-Hinkley test. 

One of the most important design lessons is that the 
high frequency component of the force signal carries es- 
sential information. During manipulation, the robot fin- 
;ers will exert time varying lowpass forces. Therefore, 



Level 4: Associating Dynamics with Objects 
in the Environment 



Level 3: Characterizing Dynamics of Interaction 



Level 2: Association of Signal Statistics 
with Regions of Space 



Level 1: Network of Expected Signal Statistics 



Level 0: Characterizing Sensor Signal Statistics 



Table 1: Hierarchy of haptic explanations. 



any unexpected events, like impact or slip must be de- 
tected in the high frequency range. Lowpass filtering 
the sensors to eliminate this "noise" removes a major 
information channel. 

Soft covers, used for better gripping, for the aluminum 
fingertips have the effect of lowpass filtering the contact 
signals. Therefore, a two part sensor that uses a high fre- 
quency sensor embedded in the cover, and the fingertip 
sensor for low frequency signals needs to be developed. 
The high frequency sensory can be coarsely distributed 
and have poor directional sensitivity. It's only the vibra- 
tion magnitude that is important. 

Sensors should also be designed with critical damping 
properties. The fingertip sensor is lightly damped and 
rings when excited at high frequencies. Light damping 
makes modeling the force signal source more difficult, be- 
cause near the natural frequency the signal is a reflection 
of the sensor's dynamics and not the input signal. 

Although the full algorithm implemented in this paper 
does not run in real-time, we feel that a less complete set 
of tests based purely on the AR models could be made 
to run in real-time on a DSP processor. The steps in 
the algorithm are highly vectorized and could be run in 
parallel. We are investigating these possibilities. 

Additional discrimination is possible by examining the 
spatio-temporal distribution of the contact signal. Local 
slip could be distinguished from an overall vibration by 
examining this distribution. However, this requires the 
development of new array sensor technologies. 

The procedures reported on in this paper provide a 
powerful front-end to an algorithm for interpreting the 
state of contact. Such an interpretation will require task 
context. This can be provided with several approaches 
each with different levels of explanatory power. We en- 
vision the hierarchy of haptic explanations shown in ta- 
ble 1. This sequence of levels of interpretation provides 
the basis for our long-term research agenda in dynamic 
haptic sensing. Our current research is aimed at inves- 
tigating levels 1 and 2 to provide a contextual basis for 
interpreting the sensor signal statistics, and the bottom 
level was investigated in this paper. 

Level 1 is the lowest level of context. In this level an 
expected sequence or network of expected hypotheses is 
associated with a task. For example in a change of tex- 
ture task, the algorithm would be told that two textures 
are present and that it is expected that a transition from 
one to the other will occur. Each texture would be char- 
acterized by the parameters that the level algorithm is 
expected to estimate for each region. 
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The next level of complexity is to associate the ex- 
pected hypotheses and their parameters with a region 
of the workspace. This approach again builds a graph, 
but now the graph is indexed by regions of space. This 
approach requires measurements of position. 

With the measurement of position, a different graph 
can be constructed (level 3). This graph would predict 
relationships between force signals and the position sig- 
nals. The elements in the graph become the possible dy- 
namic relationships between position and contact force. 
This would require knowledge of the possible physics 
available in the task. 

Finally, at the highest level of context the dynamics 
and the signals are associated with objects in the en- 
vironment. This would require approximate knowledge 
about the shape and associated physics for the objects 
that the robot might interact with. This could either be 
programmed in or be acquired through sensing with a 
combination of haptic sensors and vision. 

This paper addressed the problem of dynamic con- 
tact sensing. Dynamic models of different basic con- 
tact events were developed and used to derive a sta- 
tistical segmentation algorithm based on the general- 
ized likelihood ratio teat. This test provides a power- 
ful model-based framework for developing segmentation 
algorithms. The procedure was shown to be capable of 
segmenting a number of useful signals into statistically 
equivalent pieces. This resulted in a huge reduction in 
the amount of data that has to be given to a higher level 
recognizer. Finally, we presented a framework for future 
study of higher level recognizers and discussed the chal- 
lenges presented by our approach to perception based 
robot programming. 
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